My All in One Metric, MAMBA?

Sep 14

Originally, I had a very long section on the background of All in Ones and my opinions on them and some personal caveats I had with them that flowed into the justification for why I built this metric this way. I originally thought this overall blog post was 7 pages of text, to my horror I learned it was 30 pages and CTRL+A lied to me. I also learned the 2024 dataframe did not include rookies. Therefore, I will be keeping this much more brief, but the much more in depth sections explaining All in Ones, delving more into some reservations I have with them, Justifications for those reservations, and how that leads to the framework I gave here, are in that really lengthy version of the blogpost, which can be found at the bottom of this post.

Here is a very quick summary of that section, as it is important to understand what went behind the thought process of creating this. To pin this post, this will always be said it was published after any post I make, but this metric was made at the start of September

APM is the base form of RAPM (R stands for regularization, so this is without that), and tries to see how impactful Player X is, by seeing his impact on scoring margin, controlling for the 9 other players on the court. However, it struggles with multicollinearity, assigning the right credit to teammates who share the floor alot. To understand RAPM and Bayesian All in ones, think of it as a metaphor, of your really insightful friend watching basketball for the first time, and he’s just screaming out how good he thinks players are (Impact =/= goodness, but lets ignore that for now) by saying a number that he thinks represents that. but he just thinks everyone on the team is so great because they keep winning by 100.
Ridge Regression is typically what the “R” in RAPM is, and it involves shrinking those predictions to 0. So you can think about it, like as your friend is screaming out those numbers, you are here telling him every single player is actually a 0. That sounds a bit odd, but what ends up happening is as you keep doing this, he starts realizing who the standouts truly are; you “trolling” is going to potentially affect his opinion on many players, but while he might look at a random player and say “hmm my friend says he’s a 0 maybe i’m overating him” he’ll 2018 Lebron drop a million points and think “my friend is tripping” While this helps out a ton in practice, even though your friend is super smart, you only have a limited amount of games to show him so he might just not have enough film to truly parse out who’s good and bad, accurately at least. The other issue with RAPM can be sample size, one year RAPM is very noisy (as impact data is in that scale in general at times)
All in Ones, instead of saying everyone is a 0, you try to help him out by saying a number for each player that represents how good you think they are, this is the Box score component, SPM, or simply the prior. This has 2 effects, first, its likely intuitive how much better of a baseline it is to have some sort of measurement of how good each player is separately (So Jokic and Tristan Thompson have separate ratings you tell your friend, instead of saying they are all 0s and equal), and it ends up just being a nudge so your friend can get to the more accurate answers a bit faster
This makes All in Ones far superior to RAPM or APM, however, in my opinion while this helps alleviate issues with noise, it in turn creates bias. You are applying a linear model (more on why other types of priors don’t work well in the doc, just assume its linear and ignore XGboost and that stuff if you are aware of that stuff), to capture trends on every player. It naturally will under and overshoot on certain players because box scores dont tell you the whole story. This isn’t an issue in comparison to RAPM and APM overall as it does just end up being far, far more accurate, but at the same time, with some players (Lebron on defense post miami, likely KG), you can see that they are clearly hurt by all in ones (or at least relative to other superstars) much more because of this bias consistently when cross referencing with RAPM for players with “similar stature.” You create bias, in place of noise, massive improvement overall of course, but can cause some issues for some individuals consistently

METRIC BREAKDOWN
So essentially, for now, my metric has 2 main innovations and a few other just small tweaks that I believe boost improvement, which I will split into the Box Score Component and the Impact Component

BOX SCORE COMPONENT

The Box Score Component uses typical of per 75 poss box score data (per 100 poss adjusted, per 36 minutes), and blends some very conservative use of Synergy data, and Tracking data.
Assists were replaced with Assist points created, Blocks with Rim Points Saved (DFGA * DFPerc Diff * 2), and Unassisted FGM was used, Charges Drawn was used
Created a metric called SynergyPlayTypePOE, points above expectation based on a player’s play type distribution frequency and efficiency, in a way to account for shot quality or if a player was incredible at doing hard things. This was by far the biggest boost in the context of the box score priors.
General things, like % of games started, and team Off/Def RTG * % of team minutes a player played, which are implemented in other Priors like PIPM, were also used.
Offense was far better than defense, but the box score prior is certainly in its alpha stage, especially on defense.

IMPACT COMPONENT

Big change was here, I used an Adjusted Time Decayed RAPM here where the decay rate started before the start of the current season, and would not go beyond 2 seasons prior. (So current season, lets say 2024 would be weighed fully, and then the model would not even look at 2021 data). Time decayed just means you weight games less by how far into the past they are, this is done to take into account beyond just recent things like offseason work and improvement (or decline!).

Why do it this way, isn’t it better to just look at the current year for a current year metric? In practice, Time Decayed and multi-year RAPM with less weight on previous years, is similar to PIRAPM. PIRAPM is RAPM with previous years RAPM as priors (You yelling at your friend) instead of being 0 . These results generally look much better than Raw single year RAPM, especially in the noise category.

An example here is important. Here is a end of RS RAPM I found pastebin post of 2014 raw RAPM around the end of the regular season https://pastebin.com/gT2aN0P5 - Yes, that’s Miami Lebron at 36th. This was posted by J.E somewhere, who’s like RAPM god, so it was done right. PI RAPM uses the playoffs too, but Miami Lebron is now first. In general, Time Decayed RAPM is also far more predictive than single year RAPM, regardless of if you run it raw or do luck adjustments like BBI likes to do. It’s very comparable to All in Ones actually, maybe even favorably so even compared to the best ones.

Did luck adjustments on Free Throws (I consider FT OREBs a continuation of a possession if players on the court do not change, so it did not hurt these players), and an VERY minor one on 3 pointers with less of one on offense (Controversial of course, doing it without it would yield the same results as I did it with such low magnitude, I think Ryan Davis’s set was 50%, BBI is a bit higher i think, mine was like, 25% overall in hindsight likely didn’t change anything)

Fundamentally, though, there are some concerns here with Time Decayed RAPM and how this can create bias which I was worried about before as well, which are very valid. I won’t sit here and explain how these concerns are completely invalid, but I will try to justify that these aren’t as worrisome as one might think with evidence below. For reference, Shai is 2nd in 2024 “MAMBA”

How do these things fit together

The Box Score Component reduces that past year Bias because it only takes stats for the current year. It’s like a left hook as you’re falling from a right hook. To go into the benefits overall and alleviate obvious potential concerns. You can kind of set how much the model regresses to priors (How much your friend is listening to your numbers). TDRAPM creates a larger, more stable sample, meaning you can set this number to be less strict, and also create SPMs that can potentially capture less connections, whereas normally you have to absolutely NAIL the top guys. being super high so it’s passes the sniff test + for messaging. Really, you still have to do that to an extent, but there’s just less reliance there, if that makes sense. Essentially, it’s a mutual enhancement between the larger sample and the SPM helping each other out.

I do think that there are individual cases where the fact that the prior year is a factor may hurt, a huge concern would be if it can capture players who made big jumps. Therefore, I got every MIP and their ranking in MAMBA (that’s closer here) EPM and LEBRON.

MIP Rankings For all 3 Metrics

Note: Green = Lowest, Red = Highest Ranked
THIS DOES NOT MEAN GREEN IS GOOD AND RED IS BAD, JUST TO SHOW THIS DOES NOT UNDERSHOOT GUYS WHO IMPROVED A TON

To be clear, this isn’t to say green = good in the sense that its more accurate, but it is more to show that “MAMBA” does not have issues with players who have big jumps or changes in performance. (Perhaps it would be better to look at players who had massive RAPM changes, but that seemed a bit less practical and a less strong/clear message) In theory, MIP = biggest jump, of course in practice that may. not be the case.

Now, I won’t handwave this concern away, it absolutely creates a bias. The way I would say its a value-add, is that you alleviate the bias that the box score or SPM can create but create some bias with this previous year being weighted.

Last thing before I show the metric’s performance and accuracy in testing: Some will tout their all in one as the literal impact a player had, many of these people are far smarter and more qualified than me. even though I do think even in this proof of concept form this is generally at least somewhat competitive with the stronger All in Ones in the public sphere, here is my stance: Just like how we use (practically) Raw Impact as a player evaluation tool to estimate True impact, All in ones are all ESTIMATIONS of True Impact. I’m never going to say “Player X was a 7th impact player” because he was 7th in MAMBA, or any other all in one, because they are all estimations. EPM and LEBRON do the best in testing, (not including MAMBA :P), but they can vary by 100 spots or more for some players. And its not even necessary that any of them are right on an individual player, maybe they all miss the plot! That being said, if they all paint a clear picture on a player’s impact not matching your preconceived notion of them, that may be a sign that its something to look into, but it is NOT irrefutable proof that you are wrong.

METRIC TESTING

To test this, I incorporated something called Retrodiction testing, using methodology similar to how EPM and LEBRON creators tested these metrics. I did mine most similar to Krishna Narsu’s (LEBRON creator, although he used projected minutes and I used actual minutes, he did do actual minutes separately). Essentially, I got players “All in One Score”, Multiplied that by minutes next year, and then grouped players by team. Getting the overall sum gives you a “Team Score”, and then got the R^2 (Correlation Squared) to team wins the next year. R^2 = how well it explains variance, but just think about it like how well it can tell which teams are better than which for now (Don’t destroy me for explaining it like that stats people its just for simplicity :D). low minute players were given replacement player values , rookies were given a tad better than that but still strongly negative.

How did it perform? Keep in mind, this is a stage 1, proof of concept version. Most of these metrics have been tested and gone through lengthy development and testing before they were deployed, and subsequently editted and further improvede upon over the course of real world testing and among years, This was quite literally was from the first batch of results (I made 4 batches with the difference being how important I told the model to weigh the box scores) I made. In this overall process, if you take out the time I spent collecting data, running the model, testing the metrics among each other (as I did not test more than once), and the longest part of all, writing this, between the week I’ve spent doing this, genuinely only about two days was spent on the entire process of building the model itself, with very little fine tuning having been done.

With those caveats (excuses) out of the way, it performed the best by a very decent margin. It was the best in 5/9 years, including 2/3 years out of sample. The out of sample years especially it performed better, LEBRON had an average R^2 of 59.2, EPM one of 58.73, Mine had one of 64.8. The results I had overall were similar to the ones that were shown on Krishna Narsu’s twitter (For LEBRON and EPM), so I doubt any deviation in testing or large error occurred. (“it would be the, Compared to what Steve suggested” post).

It likely performed better than the unreleased metric, LAZYLEBRON as well based on the relative results, which was a BBI metric that was incredibly predictive but produced strange individual results (Like Steven Adams, Caruso, Delon Wright and Capela all top 10 in 2022) and therefore was unreleased. (all of that info is on his Twitter, that is where I got it from). A multi year version of this metric (Multi year in this context = getting MAMBA from 2020, 2021, and 2022, and using that to test 2023, and having that be the model) would be interesting, as that was the multi year predictive thing tested in the below twitter thread (It wasnt just LEBRON or EPM using multi year data, it was using different years of LEBRON or EPM values). The thread below has Krishna Narsu’s results testing the metrics. assuming the gaps are around the same, a 0.03 R^2 gap is at least seemingly notable.

https://x.com/knarsu3/status/1763321501766627328

Now As Cool as it would be to be able to replicate this:

I will make this very clear. This is a first Draft of a metric. More than that, I do have much more appreciation towards what goes into making All in Ones and that there is a balance between “Predictive accuracy” and “Players have to make sense.” To be fully transparent, here are the general things I think need to be improved on:

I want to have more emphasis on Bigs, Perhaps separate models? I think doing a KNN to get different groups of defenders sounds like something that sounds nice but won’t actually work practically with this kind of model
It likely undershoots AD and Giannis defensively, I want to incorporate Blocks into the Rim Saved metric somehow, as when having them as separate predictors multicollinearity shenanigans occur, and interaction effects create no-nos, but I can think of a few ways to incorporate it. (So blocks aren’t currently part of it, just rim points saved)
Synergy POE was a super good thing to add, it might undersell guys who are efficient because they create crazy good opportunities with their movement vs their team doing it, AD and Wemby provide unique value there. If I could separate Rolls and Pops that would be nice but I don’t have the synergy API.
It will generally struggle to Identify really good players on teams that can be elite in the regular season without them. KD on the Warriors and Kawhi on the Raptors, because it is less reliant on box scores than some other all in ones (I guess thats somewhat of a niche it fills lol), certain players like that might be undervalues. this kinda is true for all metrics though, but still
This is the first run, playing with different box score weighings and decay values is on the agenda, but I want to focus on the SPM portion of the metric, particular the defensive part, because that is certainly still in stage 1. Perhaps different box score weighings for defense and offense too?
Strange Individual Results
All in ones always have some weird results at an individual level, so before this into the trash because player X ranked a bit weirdly, I briefly went through some weird ones off a very cursory look and compared it to other metrics. If they all have a similar weird result, that isn’t to say that they are all right or anything, its to say that its just an All in One thing versus a flaw with MAMBA specifically, but certainly some players I think my metric will “get wrong” more than other ones, which applies to all of them of course. Here is a breakdown of some odd results and some general commentary.
2015: MAMBA: Lebron at 6, George hill at 7. (EPM Lebron at 5, George Hill at 7, ) (LEBRON: Lebron at 4, George hill at 15. )
Note(2015 to 2017 was interesting because Lebron shot up the less I weighed the box score, so kind of the “This type of stuff undersells him” vibe, he was overall #1 taking the 3 years together in the impact part of it by alot (Curry 15-17 was the #2 stretch from 2014 to 2024 I think too, or something like that. Also it gets AD very wrong, its very low on AD for some reason which I disagree with).
2016: MAMBA: Lebron at 4 (EPM 4) (LEBRON 2)
2017: MAMBA: Durant 8 (EPM 14, ) (LEBRON 9)
Obviously should be much higher, but mostly from the warriors doing well when he was off the court
2018: MAMBA: AD 15, KD 16 (EPM: AD 3, KD 15,) (LEBRON AD 8, KD 12).
I think AD should be top 5 with the RS he had, but yeah mine is insanely low on him for some reason, its a flaw I think
2019: MAMBA: Kawhi 15, (EPM Kawhi 15,) (LEBRON Kawhi 12), Player of the year of course, its just low on him because toronto did well without him playing sometimes and its an impact thing
I think AD should be top 5 with the RS he had, but yeah mine is insanely low on him for some reason throughout
2020: MAMBA: Kemba 8, (EPM Kemba 43,) (LEBRON Kemba 23)
This is a “what the hell result”, like EPM having Nurk or Zu top 10 in some years, LEBRON does a good job at not having really odd players top 10 consistently actually
2022: MAMBA: Luka 18, (EPM Luka 17), (LEBRON Luka 8),
All of these undershoot him but EPM and MAMBA absurdly so, LEBRON does the best here but obviously Luka is Luka
2023: MAMBA: Luka 11, (LEBRON Luka 7), (EPM Luka 7)
Same as Prior except mine stands out at undershooting him
2024: MAMBA: Giannis 7, EPM Giannis 4, LEBRON Giannis 2:
Relevant: MAMBA Defense: Giannis 1.2 (87th) EPM Defense: Giannis 1.8 (72nd) LEBRON Defense: Giannis 0.9 (64th)
EPM and LEBRON are both reasonable rankings, mine is not. Really obvious glaring issue with Mitchell above him on defense on mine, on EPM they are actually fairly close (Mitchell at +1.5, Giannis at +1.8) , in LEBRON they are further apart with Mitchell at 0.2 and Giannis at 0.9, but all of them are a bit lower on Giannis defensively this past year (which I don’t agree with, to be clear). That being said, I would say mine did a good job ranking his defense during his DPOY seasons and 2021. On EPM: 2019 24th, 2020 9th, 2021 64th. on LEBRON, 2019: 3rd, 2020: 7th, 2021: 5th. on MAMBA: 2019 8th, 2020: 2nd, 2021: 5th. EPM stands out as being a bit odd there, LEBRON and MAMBA both do similar jobs, with MAMBA probably undershooting his 2019 DPOY campaign and LEBRON undershooting his 2020 DPOY campaign. Knowing how his raw RAPM and impact data looked like during this stretch, I would argue the closer to 1 or 2 the better personally. Not sure what to make of this.
AD being off in a lot of these, It has Jokic way too low on 2021 at like 7, but has him #1 every year since by much more than the other metrics.
This might look bad, but I’m literally looking through to see things I find stupid about mine, you could likely do that for everything (I think 2024 Curry is like 25th in LEBRON? 12th on mine and on EPM, but LEBRON generally I think looks really solid at the top for sure) The point isn’t to disparage anyone, or any number, but the point is to say all of these metrics have some weird individual results, mine included. I absolutely think on many of these mine is just flat out completely missing, hopefully once I edit the box scores it will mitigate that issue. Pretty much all of these players (Aside from Kemba and Hill) I consider top 1,2,3 or 4 in those years.
I think All in ones are fantastic tools but they aren’t like a “how good is this guy” metric, a guy being ranked way lower than expected on a team that functions well without him isn’t necessarily a bad sign on that player, because impact comes just as much from “They get way better when you’re there” To “They Suck when you sit”
There is seemingly a tradeoff between how accurate/predictive a metric is and some really odd individual results. Based on the accuracy tests at face value, I think the tradeoff is worth it at least considering what seemingly happened with LAZYLEBRON’s results
WNBA version for this exists and it is not public because I am in an internship (and hopefully will become full time!). It tests better than anything available but for the WNBA specifically I much prefer LEBRON right now for the low sample padding.
In terms of not having glaring players at the top or having a top tier guy way too low, I think LEBRON does the best job there, off a quick glance
Im pretty fine saying the “misses” here are simply misses caused by bias or just noise clouding reality, many people within the analytics space might be against me having that opinion if all the all in ones agree on something, but that’s just my personal take in some situations where it just doesn’t pass the sniff test. I do think when those discrepancies exist, its worth looking into though, just that all in one results that really deviate from general opinion is more a potential signal than some sort of proven, irrefutable answer, if that makes sense.
FINALLY THE IMPACT METRIC

WEBSITE FOR INTERACTIVE TABLE (Preview Below) https://timotaij.github.io/LepookTable/

I posted an excel file below, but above is a more appealing/responsive viewing format.
https://docs.google.com/spreadsheets/d/1ZMR47Z8MDX9Tt7oQy5p5vzkwLznt9ROc/edit?gid=147787302#gid=147787302 < Spreadsheet Format

NOTE: Players who played under 200 minutes in a season may not be shown correctly, but that was not a problem for the metric testing

So what does this mean? Did I create some new super metric or whatever that towers over the competition?

Testing and Out of sample testing is cool and all, but at the end of the day it isn’t the same as legitimate real world results after it was made. Now to be clear, this isn’t a case where I kept building the model running it over and over again until I got good correlations, this was all within the first batch of results I got.

More than that, this is the “Proof of Concept” phase. My hope is some of the odd individual results will be fixed once this metric gets perfected more so. I have things I blatantly need to work on, the Offensive results overall were great in testing but the Defensive results weren’t quite as stellar (Did retrodiction testing with offense and defense). They were still good, but were a good deal worse than EPM, and defense was where I thought this would shine. This is likely because of issues with my SPM, this methodology means it can handle a worser SPM in general but that is not an excuse to have one that is not particularly good.

For me, this serves best as a “Proof of Concept”of this type of framework. In its current stage, with its testing I would tentatively say it likely is comparable to things like LEBRON or EPM, even though it may perhaps have some odder results at the top end from time to time. Maybe it does better at predicting people outside of the top 10? But EPM and LEBRON are essentially perfected versions of what they are within their framework, while this is very much scratching the surface with its framework. Each one will have certain flaws and misses and biases, but beyond some exciting test results, I think its valuable to be able to know more clearly where those biases come from even if it has less overall “incorrect” bias in that regard (with this being the last year stuff).

Obviously this wasn’t the most formal post, but yeah any questions comments concerns or if you just wanna reach out. timothycwijaya@gmail.com, Timotaij on instagram, Teemohoops on twitter, and my linkedin of Timothy Wijaya are probably the best places to reach me.

Note: Caveats over All in Ones on a more philosophical standpoint are beyond the scope of this post, but thats a very interesting discussion
Note: This list does not represent how I would rank players, AT ALL.
Note: As I said, this is a first draft of a metric.
Note: Public sphere is important, im sure teams have better versions of these in house
The WNBA version was without synergy stuff which I will have to add downloading CSV files manually, and of course no tracking data, it performed better than any All in Ones in the WNBA scene by a large margin, although I prefer LEBRON for the WNBA (LEBRON is not readily available for the WNBA, its similar in its predictive accuracy there).
Long version of this (It is unedited, has many grammar errors and generally much more rambling and isn’t super professional). Perhaps just the first part with the explanation and breakdown of All in Ones and justifications of the caveats I have with them and the issues I had with biases I mentioned is useful though. https://www.teemohoop.com/mamba-or-lepookie
Huge thanks to Seth Partnow and Ben Alamar for giving insights during the Las Vegas SBC program (We did not talk about this or anything, but it they helped me a ton with how to approach my internship), to Eli Horowitz for giving me a chance with giving me the internship opportunity as I likely wouldn’t be able to do anything in basketball if it was not for that, and to Nathan Hollenberg for helping me out with some questions I had on RAPM samples and for all the wonderful advice he gave me during our coffee chat!

TIMOTAIJ TJ

My All in One Metric, MAMBA?

BOX SCORE COMPONENT

IMPACT COMPONENT

How do these things fit together

METRIC TESTING

Now As Cool as it would be to be able to replicate this:

Strange Individual Results

FINALLY THE IMPACT METRIC

WEBSITE FOR INTERACTIVE TABLE (Preview Below) https://timotaij.github.io/LepookTable/

So what does this mean? Did I create some new super metric or whatever that towers over the competition?

MAMBA Reworked Updated:

Four Factor RAPM

Data Bball Portfolio TJ